Week 1 - Gravitational Wave Data Analysis


Table of Content

  1. Question 1
  1. Question 2
  1. Question 3

This is the notebook for the first week of the Big Data in Physics practical. The explanation of the code is in markdown above the cells in question. Sometimes some explanation is given as comments.

Question 1: Matched Filter

Q1 - First Part

Here I tried to use both the local data from the download, and the data accessible from ther Merger function. The time series look very different, which is surprising considerig they should pretty much be the same, but they do result in a very similar looking PSD and mass values. I decided to go with the grid data as it was more practical and some of the code in the hints showed a slight use of the Merger data, and so I thought that I would just fully use the timeseries given by it.

The next two cells simply import the data filters it to remove low frequency noises that are not useful for this analysis with a highpass filter and plot it using Plotly for the interactive plots. All of the plots are available as exterior HTML code that can be embeded into any webpage and be interactive.

Q1 - Second Part

In this part we simply calculate the power desity spectrum of the Livingston and Hanford observatories from our data. In this case we use 2 second data samples for the Welch method, which will create our psd.

The last method, inverse_spectrum_truncation, just creates the psd that doesn't allow for the psd to include frequencied below 20Hz, as this is the value of the highpass filter that we created in the last section, over the data.

We then simply plot the output psds together.

The next step consisted in calculating the estimated mass of the objects in the binary system (we assume both have the same mass). With help, we know the mass is in the range of 1.3 to 1.5. We create a loop that goes through every value in this range, with a specific step value and see which mass has the highest snr, which would indicate that this mass is the correct value. In this case we focused on the H1 detector as it's the one we unsed the the next steps.

We found that the mass is 1.38 solar masses and that the snr for that is at 9.56. If we do the same for the L1 detector, we find a value of 1.36 solar masses and snr around 34.

The next steps are a little more complicated in my understanding, as the code above uses a waveform defined through it's frequency domain, but all of the examples that I could find on PyCBC used time series to create the wave form (get_td_waveform instead of get_fd_waveform). As such I tried to convert the code I found into frequency domain, but this creates complex values and messes around with the time position of the merger. It was also sometimes a little tricky for me to understand if I should use strain or stilde (which are the time and frequency series respectively) in certain contexts or if cyclic_time_shift still applied and so some weird behaviour probably spawns from that.

It's a shame that this merger and the one given as example are so different, as it was difficult for someone not initiated to understand what variables should change or not (like filter lengths and values, delta_f, what differences there are with approximants, because those are complicated, etc...). Giving a merger that has more similarities would probably have made the exercice less dense, as it is so much theory and library info to learn in theoretically one week.

The code below creates a frequency domain waveform with the mass found above and then shifts the data so that the merger can be found at the beggining of the array. This is the purpose of the cyclic_time_shift, which treats the data as a circle and can shift the start around. "get_td_waveform" creates the series so that the merger is at zero, so that is what we have to give the function. This is however where I'm not sure, because I use this on "get_fd_waveform" instead and I don't know if that is allowed. There was no errors so I left it like this, but the template is now a complex number series instead of real numbers...

The code then goes on and makes the matched filter, once again using the template we created and the frequency series of our data. The filtering process can corrupt the beggining and end of the data, so we just chop away 8 seconds at the beggining and 4 at the end to remove these undesired structures. We then just plot the SNR, taking into account that it's a complexe set and find the maximum value.

We now will try, thanks to the SNR value and all the above steps, to fit the template onto the noisy data, as the time, amplitude, and phase of the SNR peak tell us how to align them together.

The first step is to shift the template to the right spot and then scale it so that it would have an SNR of 1 if it was in data. This is done by using the sigma function. The next step is to scale the template amplitude and phase to the highest value.

The next steps in the process is to whiten the template and the data and then bandpass them between 30-300 Hz. In this way, any signal that is in the data is transformed in the same way that the template is and will match. The next step is simply to select the merge area data from both the template and the data and superpose them in a plot to see the match.

I am pretty sure this is not what the plot should be like, but I never managed to get better results except by choosing a mass much higher than what we found (36 solar masses). I however looked at the merger and it is an even of two neutron starts, with the mass approximately of what we found, so I don't know where it went wrong.

The last step of this part is to look at the QTransforms of the whitened data. These are very simple to implement with PyCBC and can be seen below

Q1 - Third Part

In this part of the question we investigate the correlation between close waveforms, or more specifically, how well they match.

The first step is to create the base waveform of known mass. This is the wone that will be used as a base. The code then goes into a loop for a certain range of masses, and at each iteration creates a new waveform with that new mass. The two waveforms are then both resized to the same length. The aLIGO psd is then generated and used to create the match between the two waveforms. These matches are then stored in an array and plotted, showing the highest similarity between waveforms of same mass, as expected.


Question 2: Detector Noise

Q2 - First Part

Same steps as above to create the PSD: this cell loads and prepares the data into time and frequency domains.

Also as in the earlier question, we create the waveform and match filter it to get the SNR. In this case we get an SNR of 5.18, meaning that we don't have a signal in our data set, only noise, which allows us to verify that the noise is white, meaning that it should distribute as a gaussian with zero mean. We also plot the SNR.

Q2 - Second Part

This part will investigate the distribution of the noise in our data. First we whiten the data we have and interpolate it and applying the Welch method.

Now we want to represent to represent this whitened data as a dirstibution, and as such we can also calculate the standard deviation $\sigma$ and the mean $\mu$ as well as plot the distribution. As we can see from this, the mean is very close to zero and the distribution looks exactly like a gaussian. The distribution is also normalized before plotting.

Q2 - Third Part

We now are going to demonstrate that the noise is not distributed as stationnary gaussian noise. To do this instead of using the whole data to look at the snr, we will do the same steps as above but for smaller data chuncks and then look at that distribution. In this cell we cut the data into 1000 slices that we analyze.

As we can see from the distribution, when applied to smaller chuncks the snrs drawn from data are not similar at all and as such are not stationnary but vary throughout the dataset.


Question 3: Horizon Distance of Detectors

Q3 - First Part

In this section we investigate the horizon distance of different gravitational wave detectors.

in this cell we import the prebuilt PSD data from the PyCBC library. Here we import data from aLIGO and filter it's frequency range by changing transforming to infinity all the frequencies that are not in the wanted range. We then plot the psd of the observatory.

We now create a waveform that can take multiple mass values to analyze how the SNR detected by the detector changes depending on the mass of the binary system. We only keep the values that are > 8 as they represent an actual event and scale them by dividing by 8. We then plot those values, called the horizon distance.

Q3 - Second Part

The horizon distance is a good tool to analyze how efficiently an GW observatory can detect signals from different binary sources, with different masses. In our example above we only used waveforms of the same distance, but that can also be varied, with the mass to creat a map of the sensitivity of an observatory. in this case, we can see that the SNR for low masses is quite low, as well as for very massive object. These are the limitations of aLIGO for a distance of 1000Mpc. aLIGO for objects at this distance is most efficient to detect signals emmited by binary binary systems of combined mass ~100 solar masses.

Q3 - Third Part

In this part, we just repeat the same as for part 1, but for the Einstein telescope.

We can see that compare to aLIGO with same distance, the Einstein telescope has a much wider range of visibility and a maximum for systems with four times the mass of what aLIGO can detect.